One Parser, Many Languages
نویسندگان
چکیده
We train a language-universal dependency parser on a multilingual collection of treebanks. The parsing model uses multilingual word embeddings alongside learned and specified typological information, enabling generalization based on linguistic universals and based on typological similarities. We evaluate our parser’s performance on languages in the training set as well as on the unsupervised scenario where the target language has no trees in the training data, and find that multilingual training outperforms standard supervised training on a single language, and that generalization to unseen languages is competitive with existing model-transfer approaches.
منابع مشابه
Feature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملMany Languages, One Parser
We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (finegrained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize...
متن کاملThe Use of Predicates In LL(k) And LR(k) Parser Generators (Technical Summary)
Although existing LR(1) or U ( 1 ) parser generators suffice for many language recognition problems, writing a straightforward grammar to translate a complicated language, such as C++ or even C, remains a non-trivial task. We have often found that adding translation actions to the grammar is harder than writing the grammar itself. Part of the problem is that many languages are context-sensitive...
متن کاملتولید درخت بانک سازهای زبان فارسی به روش تبدیل خودکار
Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...
متن کامل